Teradata: Teradata Utilities

The Teradata database provides several Application Utilities for processing large numbers of INSERTs, UPDATEs, and DELETEs in a batch environment. Each utility exploits the capabilities provided by the Teradata parallel architecture for a specific data maintenance or batch-processing activity. Teradata application utilities are supported on several hardware platforms including a wide range of channel-connected mainframes. Regardless of the host platform, however, all access between the host and the Teradata database relies on the Call Level Interface (CLI), a series of callable subroutines that reside in the host's address space. CLI is responsible for creating and managing the parcels that travel back and forth between Teradata and the host. It permits the host to send multiple tasks (sessions) to Teradata at the same time. CLI is the vehicle that makes parallel access possible.

Teradata has been doing data transfers to and from the largest data warehouses in the world for close to two decades. While other databases have allowed the loads to break them down, Teradata has continued to set the standards and break new barriers. The brilliance behind the Teradata load utilities is in their power and flexibility. With six great utilities Teradata allows you to pick the utility for the task at hand.

Here is an introduction to the six Teradata Utilities starting with the first three of BTEQ, FastLoad and MultiLoad. BTEQ was the first Teradata query tool and first utility because it was built as a report writer, but it also imports and exports data 1-row at a time.

FastLoad is used to load to Teradata tables that are empty in 64K blocks. This is a mover and shaker and always feels the need for speed. Sorry though because your table can’t have Secondary Indexes when loading, join indexes, triggers, or referential integrity. You can however use FastLoad to load the table and then add your Secondary Indexes, Join Indexes, Triggers, and Referential Integrity.

The only command that FastLoad needs to know is INSERT because it INSERTS into empty Teradata tables by loading 64K blocks of rows (could be hundreds to thousands of rows with a single block load).

MultiLoad is like FastLoad in that it also loads in 64K blocks so it is also considered a block utility. BTEQ is not a block utility because it works a row at a time. Where FastLoad only understands the word INSERT because it only INSERTS into empty tables, MultiLoad is used to populate populated tables. The idea is to use FastLoad to load to an empty table the first time and then use MultiLoad each time you want to add to the table. MultiLoad understands the words INSERT, UPDATE, DELETE and UPSERT.

FastExport is another block Utility that works in 64K blocks, just like FastLoad and MultiLoad, but FastExport only exports Teradata data off of Teradata.The only word that FastExport understands is SELECT. You SELECT the data from the table and then FastExport exports it off Teradata in 64K blocks.

TPump is one of the most exciting utilities. It works a row at a time so it is slower than FastLoad or MultiLoad, but you can have Secondary Indexes, Join Indexes, Referential Integrity and Triggers on your table while you load it. Why would you use something slower like TPump when you can rapidly load using MultiLoad? Because users can continue to query a table while TPump quietly INSERTS, UPDATES, UPSERTS, or DELETES rows in the background.

Think of MultiLoad as a noisy train coming down the tracks disrupting everything in its path and TPump as a quiet truck loading to its destination.

As mentioned above, there are efficiencies associated with using large blocks of data when transferring between computers. So, the logic might indicate that it is always the best approach. However, there is never one best approach.

You will learn that efficiency comes at the price of other database capabilities. For instance, when using large blocks to transfer and incorporate data into Teradata the following are not allowed:

1. Secondary indexes

2.Triggers

3.Referential integrity

More than 15 concurrent utilities running at the same time Therefore, it is important to understand when and where these considerations are present.

Before Teradata V2R6.0 when Tera-Tom was just a baby in the crib the DBS Control parameter MaxLoadTasks had a maximum limit of 15. This meant no more than a total combination of block utilities (FastLoad, MultiLoad and FastExport could run simultaneously. Many companies set this to 5 because these Block Utilities will have major impact on a system and can greatly affect user query performance.

After Teradata V2R6.0 Teradata increased and changed this number. It no longer includes FastExport. Let me explain. Now there can be up to 30 concurrent FastLoad and MultiLoad jobs, but remember it is up to each individual company to determine if this is too many because of the performance hit. For FastExport jobs up to 60 can run concurrently. The only caveat here is that 60 FastExports can run simultaneously (minus the number of active FastLoad and MultiLoad jobs also running). This new feature is actually controlled by a new DBS Control parameter named MaxLoadAWT, which controls AMP Worker Tasks (AWT). When MaxLoadAWT is set to zero then it is like going back in time to pre-V2R6.0 where only 15 FastLoad, MultiLoad and FastExports can run max.

When MaxLoadAWT is greater than zero the new feature is active. Each AMP can perform 80 things at once, thus meaning that 80 AMP Worker Tasks per AMP. The MaxLoadAWT should never exceed more than 48 or the AMPs would not be able to do much else during the load.

Block level utilities have speed but so many restrictions. The opposite of sending a large block of rows at the same time is sending a single row at a time. The primary difference in these approaches is speed. It is always faster to send multiple rows in one operation instead of one row.

If it is slower, why would anyone ever use this approach? The reason is that it provides more flexibility with fewer considerations. By this, we mean that the row at a time utilities allow the following:

1. Secondary indices

2. Triggers

3.Referential integrity

4.More than 15 concurrent utilities running at the same time

As you can see, they allow all the things that the block utilities do not. With that in mind and for more information, continue reading about the individual utilities and open up a new world of capabilities in working with the Teradata RDBMS.

Fast Path Inserts inside the Teradata Database

The load utilities such as BTEQ, FastLoad, MultiLoad, TPump and TPT are designed to import or export data to and from Teradata, but it is also important that you understand that once you are inside Teradata you can use an INSERT SELECT from one Teradata table to another and get great speed.

If the target table you are loading to starts empty then there isn’t a large amount of writing to the Transient Journal. The Transient Journal is designed to Rollback bad transactions, but since the table starts empty there is only one write to the Transient Journal and then it is idle. If the Transient Journal needs to rollback the table it just empties it like it started.

For the Fast Path to be taken both the target and the source table must have the same Primary Index. That way no data has to be moved across the AMPs via the BYNET and Teradata can just copy and insert the blocks directly. This is why it is called the Fast Path.

BTEQ

Why is BTEQ available on every Teradata system ever built? Because the Batch TEradata Query (BTEQ) tool was the original way that SQL was submitted to Teradata as a means of getting an answer set in a desired format. Here is what is excellent about BTEQ:

BTEQ can be used to submit SQL in either a batch or interactive environment. Interactive users can submit SQL and receive an answer set on the screen. Users can also submit BTEQ jobs from batch scripts, have error checking and conditional logic, and allow for the work to be done in the background.

BTEQ outputs a report format, where SQL Assistant outputs data in a format more like a spreadsheet. This allows BTEQ a great deal of flexibility in formatting data, creating headings, and utilizing Teradata extensions, such as WITH and WITH BY that SQL Assistant has problems in handling.

BTEQ is often used to submit SQL, but is also an excellent tool for importing and exporting data.

Importing Data: Data can be read from a file on either a mainframe or LAN attached computer and used for substitution directly into any Teradata SQL using the INSERT, UPDATE or DELETE statements.

Exporting Data: Data can be written to either a mainframe or LAN attached computer using a SELECT from Teradata. You can also pick the format you desire ranging from data files to printed reports to spread sheet formats.

Logging into BTEQ

When logging onto BTEQ in interactive mode you will type .LOGON followed by your TDP-ID. The TDP-ID identifies your system. You could have multiple Teradata systems, such as a production system and a test system. The TDP is the name of the system you are logging onto.

Then you enter your User-ID. You will then be prompted for your password, which is hidden from view as you type it in. Remember that BTEQ commands begin with a period (.) and do not require a semi-colon (;) to end the statement. SQL commands do not ever start with a period and they must always be terminated with a semi-colon.

Using BTEQ to submit queries in Interactive Mode

Once you logon to Teradata through BTEQ, you are ready to run your queries. Teradata knows the SQL is finished when it finds a semi-colon, so don’t forget to put one at the end of your query.

Why do you need to place a semi-colon behind all SQL? That is how Teradata knows the BTEQ query is finished. You might write a very long query and have to hit ENTER many times. But once you hit ENTER and Teradata sees a semicolon it is like hitting EXECUTE on SQL Assistant or the Nexus Query Chameleon.

The first query merely makes SQL_Class the default database. The second query selects from the Employee_Table.Then you see your result set.Take a look at the SQL on the following figure. It does a simple SELECT * FROM Employee_Table, but also notice the WITH (SUM) BY Dept_No.

This is the WITH BY statement that will allow for detail rows being reported with Sub-Totals on Dept_No breaks. The WITH BY Statement only works in BTEQ. Notice that each detail line shows information about a particular individual in a particular department number. Then when there is a new department number the report will break and show the SUM (Salary) for that particular department. This is an example of the WITH BY Statement showing subtotals.

The Four Types of BTEQ Exports

There are four types of BTEQ Exports. BTEQ allows for multiple techniques to export data. We usually think of an export as moving data off of Teradata to a normal flat file.That is example number one and that is called RECORD Mode.

Sometimes there are NULL’s in your data and if you export them to a mainframe the actual mainframe application could run into problems interpreting them. That is why INDICDATA is designed to place bits in the front of the records to warn of NULL values.

BTEQ can actually take your SQL output report and include the Headers and export all together. It looks like an electronic report. That is EXPORT REPORT mode. This is also called Field Mode. Reports are truncated to 254 characters for mainframe and 75 for network attached computers. You can set your report up with REPORTWIDE which effectively sets the width to 32,765 characters (not supported in all releases). The .SET Width command in BTEQ can be used to set the width to a range ranging from 20-65,531.

The last mode is DIF and this is used when you want certain flat files to be able to be used by PC applications that utilize the Data Interchange Format.

A BTEQ Export Script

Exporting File

Exporting Report File

The following pictures shows how to export data in a normal flat file and a Report flat file. The first statement is the LOGON statement. The last statement is the LOGOFF statement. You basically tell BTEQ you are going to export a file. Then you run a SELECT Query and BTEQ Exports it!

Data into a normal flat file like garbage. This is actually perfect. That is how it should look. Notice the Report File on the bottom example. It does look perfect and it is also perfect. This is the difference between our RECORD Mode and our REPORT Mode.

Output for Record Mode and Report Mode

Report Mode to Export a CSV File

You can use Report Mode to CREATE a Comma Separated Value (CSV) file. This is accomplished through the Pipe Symbol (just above the ENTER key on your keyboard). This is actually accomplished with using two pipe symbols ||. That means concatenate. If the records are greater than 75 characters (the default for Report Mode) then use the .SET Width Command to lengthen this.

Importing from a CSV File

Two Ways to run a BTEQ Script

COMMAND PROMPT

BTEQ <C:\HOME\PATH\SCRIPT NAME .

BTEQ

.RUN FILE =C:\HOME\PATH\SCRIPT NAME

Exporting large files in a UNIX MP-RAS System

Many a large thing has been aborted when using UNIX MP-RAS and BTEQ to export a file. This is because UNIX MP-RAS has always had a 2GB limit.

This is taken care of with 64-bit operating systems and I have a solution for you if you are still using UNIX MP-RAS as your operating system.

Just use the keyword AXSMOD in your BTEQ export DATA FILE command. You don’t need this unless you are using UNIX MP-RAS and exporting a file larger than 2GBs.

BTEQ Import Scripts

The following figure shows an example of a BTEQ IMPORT Script. We are taking data from our flat file called C:\Temp\CustData.txt and importing the records into the SQL_Class.Customer_Table.

A CSV file is a Comma Separated Value flat file. Each column is separated by a comma. We have taken our comma separated value flat file called C:\Temp\var1.txt and we are importing that to our SQL_Class.Customer_Table. Notice how each column is defined using VARCHAR. This is necessary when using CSV files.

Multiple Sessions in BTEQ

Teradata allows you to set multiple sessions in a BTEQ script. However, this will only work if your SQL is using the Primary Index or a Unique Secondary Index so UPI, NUPI, and USI are the only options that will utilize multiple sessions.

The SESSIONS parameter really merely tells Teradata how many times to login to Teradata to take advantage of multiple sessions. You can utilize up to 200 sessions for all types of Operating Systems as long as you are using at least Teradata V2R5. Typically BTEQ IMPORTS will take advantage of multiple sessions, but BTEQ EXPORTS will not.

TPump

TPump is the shortened name for the load utility Teradata Parallel Data Pump. To understand this, you must know how the load utilities move the data. Both FastLoad and MultiLoad assemble massive volumes of data rows into 64K blocks and then moves those blocks. Picture in your mind the way that huge ice blocks used to be floated down long rivers to large cities prior to the advent of refrigeration. There they were cut up and distributed to the people. TPump does NOT move data in the large blocks. Instead, it loads data one row at a time, using row hash locks. Because it locks at this level, and not at the table level like MultiLoad, TPump can make many simultaneous, or concurrent, updates on a table.

Envision TPump as the water pump on a well; pumping in a very slow, gentle manner resulting in a steady trickle of water that could be pumped into a cup. But strong and steady pumping results in a powerful stream of water that would require a larger container. TPump is a data pump which, like the water pump, may allow either a trickle-feed of data to flow into the warehouse or a strong and steady stream. In essence, you may "throttle" the flow of data based upon your system and business user requirements.

A Sample TPump Script

The script on the following page follows these steps:

1. Setting up a Logtable
2.Logging onto Teradata
3.Identifying the Target, Work and Error tables
4.Defining the INPUT flat file
5.Defining the DML activities to occur
6.Naming the IMPORT file
7.Telling TPump to use a particular LAYOUT
8.Telling the system to start loading
9.Finishing and log off of Teradata

Teradata Parallel Transport (TPT)

The Teradata Parallel Transport (TPT) utility combines BTEQ, FastLoad, MultiLoad, TPump, and FastExport utilities into one comprehensive language utility. This allows TPT to insert data to tables, export data from tables, and update tables.

TPT can simultaneously load data from multiple sources in a single job. It can also execute multiple instances of an operator. It can even export, transform and load one or more tables in a single job. All of this with the ability to perform inline filtering while also being able to transform data means that this single scripting feature adds enormous flexibility and power to the stand-alone utilities.

TPT’s major capability is to be able read/transform/load data and as the data is placed inside a data buffer, the next operator can start performing the next task without waiting for the prior process to complete. This is helped because TPT places data into streams, which means the data isn’t written to disk.

TPT DATA STREAM FLOW

TPT works around the concept of Operators and Data Streams. There will be an Operator to read Source data, pass the contents of that Source to a data stream where another operator will be responsible for taking the Data Stream and loading it to disk. Notice on the following page that we have a Flat File that is our Source. A Producer Operator, designed to read input will move the data to a Data Stream. The Consume Operator, designed to write data to a Teradata table will then Load the data.

The data source or destination can be:

Database (relational or non-relational)
Server
File
Data Storage device
Object (images, voice, pictures, text, etc.)

TPT can also perform a wide variety of processes while data is being streamed from source to destination, such as:

Retrieve, store, and transport objects via parallel data streams
Merging or splitting multiple parallel data streams
Filter, massage, and cleanse data

TPT Operators

There are four types of TPT Operators.

Producer
Consumer
Filter
Standalone

Producer :- Producer get data from Teradata or other external sources such as Oracle and write it to stream, the idea is to make the data available to other operators.

Consumer :- It takes the data from the stream and then loading into Teradata and other external data source.

Filter :- The filter Operator Consume data from a stream and transform it in some way and then send it down the stream further.They perform data selection, validation, cleansing, condensing.

Standalone :- It is mainly used for DDL commands. They are also used as Update operators in DELETE task.Read From Multiple Source Files Simultaneously.

TPT can take multiple source and utilize multiple Producer Operators to reach each source and then move multiple Consumer Operators and the write into different multiple source to one Teradata Table.

Teradata

Teradata Utilities